Overview & Applications of Large Language Models (LLMs)
From perspective of an investor & previous early-stage operator in the space
Background
A few months ago, as I was calling my Uber back to San Francisco after a day of meetings in South Bay, routine small talk led to a lightbulb moment. I was just finishing up a meeting with an early-stage infrastructure startup founder, who is also one of the strongest engineers I’ve ever met. “The team is just addicted to Github’s Copilot,” he mentioned casually. Looking up from my phone, I couldn’t believe what he was saying. “But wait, I thought that was just autocomplete for people learning to code. Your whole team of experienced developers is using it every day?” I asked incredulously. “Honestly, if you threatened to take it away, we’d pay thousands of dollars per month. From an engineering ROI standpoint it would still be worth it,” he responded. We then discussed how most route coding tasks - syntax questions you’d typically google, basic helper functions, & repetitive if/else branches - had become automated with Copilot’s suggestions, which ended up saving each of his technical teammates hours each day. Needless to say, I was late getting into my Uber. On the ride back, I reflected on what a massive accomplishment this product was - essentially, a mission-critical assistant for talented software engineers, a notoriously hard-to-hire & highly paid group.
Github’s Copilot product is an example of an application of a large language model or “LLM”. An LLM is a deep-learning algorithm trained on enormous amounts of text data - in this case, tens of millions of public Github code repositories. Inside of the interface developers use to code, Copilot will make suggestions of how a line of code should be finished, or even generate multiple lines of code from a plaintext description of what that code should do. Copilot is built using the LLM of OpenAI’s Codex, which translates natural language into many popular programming languages. For context, OpenAI is a San Francisco-based artificial intelligence research company; it was founded as a non-profit in 2015, made itself for-profit in 2019, and then raised $1 billion from Microsoft (who acquired Github in 2018) to fund its research. In return, Microsoft gained exclusive access to some of OpenAI’s LLMs, including Codex.
There are a few other high profile LLMs that you may have heard of before. Google’s LaMDA specializes in generating dialogue. Google’s long-term goal with LaMDA is to power a conversational interface that will enable customers to retrieve any kind of information (text, images, etc.) from Google’s products just by asking - basically, a very very smart chatbot. So smart in fact, a Google engineer recently claimed LaMDA was sentient, which led to widespread media buzz. OpenAI’s GPT-3 is another powerful text generation LLM which can perform a wide variety of natural language tasks including copywriting, summarization, & classification. Its text quality is so high that in many cases it can be challenging to determine whether or not its output was written by a human.
As a venture capitalist, I’m always looking for the next wave of technology that will create generational business opportunities. Chris Dixon, a general partner at a16z, wrote an excellent post on the tech waves of the PC, internet, and mobile which led to most of the software we take for granted today. Importantly, each of these cycles took years for the tech to mature enough for many successful companies to be built. Given the rapid pace of innovation in machine learning (ML), especially in LLMs, I believe that we are beginning a wave where many significant companies supplying or utilizing this technology will be built. I constantly have to remind myself that “modern” ML is nascent - 2011 was the first year a convolutional neural net (a type of deep learning model) won the most popular computer vision competition. Transformers, which power the LLMs mentioned above, were introduced by Google Brain in 2017 - just a few years ago! Since then, each year LLMs have become more intelligent in large part by growing larger, from BERT in 2018 with 354 million parameters (costing ~$2k to train) to LaMDA in 2021 with 137 billion parameters (costing ~$6m to train). We’ve now seen that these models have massive potential, but at a cost few companies can afford. Thankfully, companies like OpenAI offer access to these models (including GPT-3 & Codex) via API, allowing others to utilize their work.
Applications
I’m fascinated by all of the ways LLMs will continue to become part of our daily lives, regardless of whether the LLM application is consuming the model output from the APIs of others or training their own models for specific use cases. When thinking through if an LLM application is feasible (or a good investment), I consider the following questions:
Data risk
How easy is it to get the data to start training the LLM? Russell Kaplan, a product leader at Scale AI, posits that “language-aligned datasets are the rate limiter for AI progress in many areas.” Basically, we have the LLMs we do now because the necessary text data is readily available on the internet, but to train other types of LLMs (predicting specific software actions, answering healthcare questions, etc.) we need to figure out a way to generate enough relevant training data.
How strong is the data moat you build & accumulate? Alex Tamkin, a PhD student at Stanford, says that companies might focus on building LLMs for specific applications where private data prevents commoditization. Examples are a hospital system with a large EHR database building a medical LLM or a company with a messaging app building an LLM chatbot. It’s important to note that the competitive advantage isn’t just the private data used to train the model initially, but the additional data you get when customers interact with the model, telling it what is right, wrong, and sometimes what the answer should be instead.
Tech risk
Is there a proof of concept that this LLM application is feasible already (e.g. from a larger company)? And how much will it cost? Having a proof of concept is both a blessing and a curse; GPT-3 proves the feasibility (& gives an idea of cost) of other copywriting generation startups, but necessitates a more competitive market. In addition, if you decide to use the API from a large company like OpenAI to build your application & there are no alternatives, you are subject to their pricing power and product SLAs. More thoughts on the ways this dynamic could play out in the conclusion of this post.
Does this LLM application have the right balance of reliability & usability for the end user? LLMs have known issues, & research is still being done to improve their accuracy & explainability on a long tail of inputs. For example, GPT-3 & Codex will occasionally output biased language & insecure or incorrect code, especially given an adversarial user. However, they are correct enough of the time such that many users still find the models useful. Perhaps other LLM use cases aren’t accurate enough to be helpful to their customers, or the cost of training the LLM to the level of accuracy required is prohibitive. (As a side note, companies like Anthropic AI are working to make LLMs more reliable & understandable.)
LLM necessity
How mission critical is the LLM for the business? As discussed, data & compute can create a moat, but can also be costly from an infrastructure, team, and prioritization standpoint. Frequently less sophisticated models might be able to achieve the end result, especially if the LLM is not the core product.
With those questions in mind, here are some examples of LLM applications I’ve seen (as well as application ideas I’ve discussed) that I’m excited to share. There are many others I’ve spoken to still in stealth that I’ll be eager to mention when I can.
Copywriting
GPT-3 is the most well-known model here, but there are open source alternatives including BLOOM (from BigScience) & Eleuther AI’s GPT-J. Startups building applications in the space include Copy ai, Copysmith, Contenda, Cohere, & Jasper ai which offer products to speed up writing blogs, sales, digital ads, and website copy.
Code generation & autocomplete
Codex (powering Copilot) is the most popular model, but there is an open source alternative in Salesforce’s CodeGen. Startups building applications include Tabnine, Codiga, and Mutable AI. I chatted with 10-20 developers who have used Copilot, and though much of the feedback was positive, pain points include wanting to self-host or fine-tune their own models, customize workflows, as well as wishing to fix some challenges Codex has with frontend frameworks & test generation.
An idea for a related feature comes from Lee Edwards, a partner at Root VC. He imagines that these applications could suggest navigating between files given the context of what you’re doing, fulfilling a more complete vision of being an automated pair programmer.
Shell command generation
Warp, a next-gen terminal, uses GPT-3 to translate natural language into executable shell commands “like GitHub Copilot, but for the terminal.” Shell commands can be unintuitive even for experienced engineers.
Regex generation
Autoregex.xyz uses GPT-3 to generate regular expressions from plain English (and vice versa!), a time consuming task for developers.
SQL generation
Cogram translates plain English into database queries, allowing nontechnical users to get data and business insights without writing SQL.
Automated code reviews & code quality improvement
Codiga offers automated code reviews, & Mutable AI productionizes Jupyter notebooks.
Database query optimization
Ottertune identifies & helps fix database problems, such as cache misses and missing indexes, that can cause unexpected issues. I’m not sure if Ottertune uses LLMs for this, but I’ve discussed this with others as a potential LLM use cases.
DevOps assistance / automation
I’ve been talking with a few infrastructure engineers who have ideas in this area. Questions I have for potential applications– is there a way to speed up configuring Terraform, catching DevOps bugs, or suggesting cost saving improvements with LLMs? Could you make products like Docker or Jenkins easier to get started with & use for people with less expertise?
Frontend / website generation
Pygma turns Figma designs into high quality code. Salesforce’s long-term vision with CodeGen involves allowing a user to have a conversation to design & generate a website.
Product requirements documentation (PRD) generation
Monterey is building “co-pilot for product development,” perhaps involving LLMs at some point. From my time as a PM, a bunch of documentation I wrote could have probably been auto-generated from code or other information.
Product insights
Enterprise search
Personalized recommendations
Naver’s HyperCLOVA not only powers search but also enables many features on Naver’s e-commerce platform including “summarizing multiple consumer reviews into one line, recommending and curating products to user shopping preferences, or generating marketing phrases for featured shopping collections.” Similarly, Shaped AI offers ranking models for discovery pages, feeds, & recommendations.
Chatbot / support agent assist
LaMDA, Rasa, Cohere, Forethought & Cresta power chatbots or improve customer support agents’ efficiency.
General software tool assistant
Adept AI’s vision is to suggest workflow steps for any software, basically becoming a universal copilot / assistant. There’s an awesome demo here showing early results. Character AI & Inflection AI may also be building in this space from their home page descriptions, but little can currently be found about them online.
Personalized tutoring
Grammar correction & style
Duolingo, Writer.com, & Grammarly provide smart writing assistants.
Translation
Meta has done research to translate 204 different languages, twice as many as ever before attempted, at a higher quality than previously achieved. Tangentially, I’ve chatted with friends about ideas for generating audio in any one language to help learn any other language with large models.
Personal decision making
Oogway helps individuals organize their choices to make better quality decisions.
Food for thought
An important question for the LLM applications that don’t own the model themselves is the long-term outcome of LLM infrastructure - will it all be commoditized by many providers offering similar models, or will the most cutting-edge company (with the best engineers, hardware, data, compute, & community) become a gatekeeper? Separately, for the LLM applications that own the models themselves, will training & inference costs be feasible to continue paying (even if they continue to decrease) as the needs of their customers & the research evolves? Or, will most of these companies switch to using others’ LLM infrastructure, similar to the cloud providers dynamic today? Only time will tell - there’s no consensus here among experts! I created a very oversimplified meme of what might happen, but I do not have conviction in which “galaxy brain” is the correct end state:
Other interesting long-term scenarios to consider are if a new entrant’s research or technology makes training radically cheaper, as well as the long-term theoretical risk of LLMs becoming so good that we as a population forget how or why we do something. Related adjacent areas I’m interested in- perhaps for future posts- include diving into the ML infrastructure details (some of my thoughts on general ML infrastructure are here), looking at applications of other types of big models (image, video, multimodal, etc.), and the AI safety implications of all this rapid progress. Finally, I need to mention that this whole blog post was generated using an LLM… just kidding (for now).
P.S. — If you’re working on or thinking about ideas in the large model space, I’d love to chat! I’m on Twitter here.
Thanks to Pete from Data Council, Bonny & Ming for the meme inspiration, all of the people I’ve had discussions with about LLMs, & the authors of all the sources I’ve cited.
another solid post with nice breadth! thx for sharing.
emerging cool B2C application is fictional writing, tools like Sudowrite. The verge had a piece today: https://www.theverge.com/c/23194235/ai-fiction-writing-amazon-kindle-sudowrite-jasper
really cool Leigh! large language models will grow a lot in the next 5 years
we presented a "Copilot for Splunk" at Splunk's annual conference (https://splunkbase.splunk.com/app/6410/)
here's our technical blog: https://www.splunk.com/en_us/blog/it/training-a-copilot-for-splunk-spl-and-increasing-model-throughput-by-5x-with-nvidia-morpheus.html